147 research outputs found
A new multidimensional model with text dimensions: definition and implementation
We present a new multidimensional model with textual dimensions based on a knowledge structure extracted
from the texts, where any textual attribute in a database can be processed, and not only XML texts.
This dimension allows to treat the textual data in the same way as the non-textual one in an automatic
way, without user’s intervention, so all the classical operations in the multidimensional model can been
defined for this textual dimension. While most of the models dealing with texts that can be found in the
literature are not implemented, in this proposal, the multidimensional model and the OLAP system have
been implemented in a software tool, so it can be tested on real data. A case study with medical data is
included in this work.Junta de Andalucia P07-TIC02786
P10-TIC6109
P11-TIC746
Financiación Internacional de la Investigación
Organizada por: Oficina de Proyectos Internacionales UGR, Welcome Center de la UGR, Agencia Andaluza del Conocimiento y Fundación Española para la Ciencia y la Tecnología (FECYT).Se muestra como se puede financiar la investigación a través de fondos internacionales
Non-Query-Based Pattern Mining and Sentiment Analysis for Massive Microblogging Online Texts
Pattern mining has been widely studied in the last decade given its great interest for research and its numerous applications in the real world. In this paper the definition of query and non-query based systems is proposed, highlighting the needs of non-query based systems in the era of Big Data. For this, we propose a new approach of a non-query based system that combines association rules, generalized rules and sentiment analysis in order to catalogue and discover opinion patterns in the social network Twitter. Association rules have been previously applied for sentiment analysis, but in most cases, they are used once the process of sentiment analysis is finished to see which tokens appear commonly related to a certain sentiment. On the other hand, they have also been used to discover patterns between sentiments. Our work differs from these in that it proposes a non-query based system which combines both techniques, in a mixed proposal of sentiment analysis and association rules to discover patterns and sentiment patterns in microblogging texts. The obtained rules generalize and summarize the sentiments obtained from a group of tweets about any character, brand or product mentioned in them. To study the performance of the proposed system, an initial set of 1.7 million tweets have been employed to analyse the most salient sentiments during the American pre-election campaign. The analysis of the obtained results supports the capability of the system of obtaining association rules and patterns with great descriptive value in this use case. Parallelisms can be established in these patterns that match perfectly with real life events.COPKIT Project, through the European Union's Horizon 2020 Research and Innovation Programme
786687Spanish Ministry for Economy and Competitiveness
TIN2015-64776-C3-1-RAndalusian Government, through Data Analysis in Medicine: from Medical Records to Big Data Project
P18-RT-2947Spanish Ministry of Education, Culture, and Sport
FPU18/00150University of Granad
NOFACE: A new framework for irrelevant content filtering in social media according to credibility and expertise
Social networks have taken an irreplaceable role in our lives. They are used daily by millions of people
to communicate and inform themselves. This success has also led to a lot of irrelevant content and even
misinformation on social media. In this paper, we propose a user-centred framework to reduce the amount
of irrelevant content in social networks to support further stages of data mining processes. The system also
helps in the reduction of misinformation in social networks, since it selects credible and reputable users. The
system is based on the belief that if a user is credible then their content will be credible. Our proposal uses
word embeddings in a first stage, to create a set of interesting users according to their expertise. After that, in
a later stage, it employs social network metrics to further narrow down the relevant users according to their
credibility in the network. To validate the framework, it has been tested with two real Big Data problems on
Twitter. One related to COVID-19 tweets and the other to last United States elections on 3rd November. Both
are problems in which finding relevant content may be difficult due to the large amount of data published
during the last years. The proposed framework, called NOFACE, reduces the number of irrelevant users posting
about the topic, taking only those that have a higher credibility, and thus giving interesting information
about the selected topic. This entails a reduction of irrelevant information, mitigating therefore the presence
of misinformation on a posterior data mining method application, improving the obtained results, as it is
illustrated in the mentioned two topics using clustering, association rules and LDA techniques.European Commission 786687Andalusian government
FEDER operative program P18-RT-2947
B-TIC-145-UGR18University of Granada's internal plan PPJIB2021-04Spanish Government FPU18/0015
Spark solutions for discovering fuzzy association rules in Big Data
The research reported in this paper was partially supported the COPKIT project from the 8th Programme Framework (H2020) research and innovation programme (grant agreement No 786687) and from the BIGDATAMED projects with references B-TIC-145-UGR18 and P18-RT-2947.The high computational impact when mining fuzzy association rules grows significantly when managing very large data sets, triggering in many cases a memory overflow error and leading to the experiment failure without its conclusion. It is in these cases when the application of Big Data techniques can help to achieve the experiment completion. Therefore, in this paper several Spark algorithms are proposed to handle with massive fuzzy data and discover interesting association rules. For that, we based on a decomposition of interestingness measures in terms of α-cuts, and we experimentally demonstrate that it is sufficient to consider only 10equidistributed α-cuts in order to mine all significant fuzzy association rules. Additionally, all the proposals are compared and analysed in terms of efficiency and speed up, in several datasets, including a real dataset comprised of sensor measurements from an office building.COPKIT project from the 8th Programme Framework (H2020) research and innovation programme 786687BIGDATAMED projects B-TIC-145-UGR18
P18-RT-294
New Spark solutions for distributed frequent itemset and association rule mining algorithms
Funding for open access publishing: Universidad de Gran-
ada/CBUA. The research reported in this paper was partially sup-
ported by the BIGDATAMED project, which has received funding
from the Andalusian Government (Junta de Andalucı ́a) under grant
agreement No P18-RT-1765, by Grants PID2021-123960OB-I00 and
Grant TED2021-129402B-C21 funded by Ministerio de Ciencia e
Innovacio ́n and, by ERDF A way of making Europe and by the
European Union NextGenerationEU. In addition, this work has been
partially supported by the Ministry of Universities through the EU-
funded Margarita Salas programme NextGenerationEU. Funding for
open access charge: Universidad de Granada/CBUAThe large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with
massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information
in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous
phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major
problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for
frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive
computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the
existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform
which has been demonstrated to outperform existing distributive algorithmic implementations.Universidad de Granada/CBUAJunta de Andalucia
P18-RT-1765Ministry of Science and Innovation, Spain (MICINN)
Instituto de Salud Carlos III
Spanish Government
PID2021-123960OB-I00,
TED2021-129402B-C21ERDF A way of making EuropeEuropean Union NextGenerationEUMinistry of Universities through the E
Rules and fuzzy rules in text: concept, extraction and usage
Several concepts and techniques have been imported from other disciplines such as
Machine Learning and Artificial Intelligence to the field of textual data. In this paper,
we focus on the concept of rule and the management of uncertainty in text applications.
The different structures considered for the construction of the rules, the extraction of the
knowledge base and the applications and usage of these rules are detailed. We include a
review of the most relevant works of the different types of rules based on their representation
and their application to most of the common tasks of Information Retrieval
such as categorization, indexing and classification
A Word Embedding-Based Method for Unsupervised Adaptation of Cooking Recipes
Studying food recipes is indispensable to understand the science of cooking. An essential
problem in food computing is the adaptation of recipes to user needs and preferences. The main difficulty
when adapting recipes is in determining ingredients relations, which are compound and hard to interpret.
Word embedding models can catch the semantics of food items in a recipe, helping to understand how
ingredients are combined and substituted. In this work, we propose an unsupervised method for adapting
ingredient recipes to user preferences. To learn food representations and relations, we create and apply a
specific-domain word embedding model. In contrast to previous works, we not only use the list of ingredients
to train the model but also the cooking instructions. We enrich the ingredient data by mapping them to
a nutrition database to guide the adaptation and find ingredient substitutes. We performed three different
kinds of recipe adaptation based on nutrition preferences, adapting to similar ingredients, and vegetarian and
vegan diet restrictions. With a 95% of confidence, our method can obtain quality adapted recipes without a
previous knowledge extraction on the recipe adaptation domain. Our results confirm the potential of using a
specific-domain semantic model to tackle the recipe adaptation task.European Commission
816303University of Granad
Evolutionary Approach for Building, Exploring and Recommending Complex Items With Application in Nutritional Interventions
Over the last few years, the ability of recommender systems to help us in different environments
has been increasing. Several systems try to offer solutions in highly complex environments such as nutrition,
housing, or traveling. In this paper, we present a recommendation system capable of using different
input sources (data and knowledge-based) and producing a complex structured output. We have used an
evolutionary approach to combine several unitary items within a flexible structure and have built an initial
set of complex configurable items. Then, a content-based approach refines (in terms of preferences) these
candidates to offer a final recommendation.We conclude with the application of this approach to the healthy
diet recommendation problem, addressing its strengths in this domain.Over the last few years, the ability of recommender systems to help us in different environments
has been increasing. Several systems try to offer solutions in highly complex environments such as nutrition,
housing, or traveling. In this paper, we present a recommendation system capable of using different
input sources (data and knowledge-based) and producing a complex structured output. We have used an
evolutionary approach to combine several unitary items within a flexible structure and have built an initial
set of complex configurable items. Then, a content-based approach refines (in terms of preferences) these
candidates to offer a final recommendation.We conclude with the application of this approach to the healthy
diet recommendation problem, addressing its strengths in this domainEuropean Union (Stance4Health) under Grant 816303Ministerio de Ciencia e
Innovación under Grant PID2021-123960OB-I00MCIN (Ministerio de Ciencia e Innovación)/AEI (Agencia estatal de
Investigacion)/10.13039/501100011033ERDF (European Regional Development Fund)A way of making Europe.
And in part under Grant TED2021-129402B-C21 funded by MCIN (Ministerio de Ciencia e Innovación)/AEI (Agencia estatal de
Investigacion)/10.13039/501100011033European Union NextGenerationEU/PRTR (Plan de Recuperación,
Transformación y Resiliencia)‘Program of Information and Communication technologies’’ at the University of Granad
A fuzzy-based medical system for pattern mining in a distributed environment: Application to diagnostic and co-morbidity
In this paper we have addressed the extraction of hidden knowledge from medical records using
data mining techniques such as association rules in conjunction with fuzzy logic in a distributed
environment. A significant challenge in this domain is that although there are a lot of studies devoted
to analysing health data, very few focus on the understanding and interpretability of the data and
the hidden patterns present within the data. A major challenge in this area is that many health data
analysis studies have focussed on classification, prediction or knowledge extraction and end users find
little interpretability or understanding of the results. This is due to the use of black-box algorithms or
because the nature of the data is not represented correctly. This is why it is necessary to focus the
analysis not only on knowledge extraction but also on the transformation and processing of the data
to improve the modelling of the nature of the data. Techniques such as association rule mining and
fuzzy logic help to improve the interpretability of the data and treat it with the inherent uncertainty
of real-world data. To this end, we propose a system that automatically: a) pre-processes the database
by transforming and adapting the data for the data mining process and enriching the data to generate
more interesting patterns, b) performs the fuzzification of the medical database to represent and
analyse real-world medical data with its inherent uncertainty, c) discovers interrelations and patterns
amongst different features (diagnostic, hospital discharge, etc.), and d) visualizes the obtained results
efficiently to facilitate the analysis and improve the interpretability of the information extracted. Our
proposed system yields a significant increase in the compression and interpretability of medical data
for end-users, allowing them to analyse the data correctly and make the right decisions. We present
one practical case using two health-related datasets to demonstrate the feasibility of our proposal for
real data.Junta de Andalucia P18-RT-1765Ministry of Universities through the E
- …